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In a large-scale quantum computer, the cost of communications will dominate the performance and 
resource requirements, place many severe demands on the technology, and constrain the architecture. 
Unfortunately, fault-tolerant computers based entirely on photons with probabilistic gates, though 
equipped with "built-in" communication, have very large resource overheads; likewise, computers 
with reliable probabilistic gates between photons or quantum memories may lack sufficient commu- 
nication resources in the presence of realistic optical losses. Here, we consider a compromise archi- 
tecture, in which semiconductor spin qubits are coupled by bright laser pulses through nanophotonic 
waveguides and cavities using a combination of frequent probabilistic and sparse determinstic entan- 
glement mechanisms. The large photonic resource requirements incurred by the use of probabilistic 
gates for quantum communication are mitigated in part by the potential high-speed operation of the 
semiconductor nanophotonic hardware. The system employs topological cluster-state quantum error 
correction for achieving fault-tolerance. Our results suggest that such an architecture/technology com- 
bination has the potential to scale to a system capable of attacking classically intractable computational 
problems. 

Keywords: distributed quantum computation; topological fault tolerance; quantum multicomputer; 
nanophotonics. 

1. Introduction 

Small quantum computers are not easy to build, but are cert ainly possible. For these, it 
is sufficient to consider the five basic DiVincenco criteria^El ability to add qubits, high- 
fidelity initialization and measurement, low decoherence, and a universal set of quantum 
gates. However, these criteria are insufficient for a large-scale quantum computer. DiVin- 
cenzo's added two communications criteria — the ability to convert between stationary 
and mobile qubit representations, and to faithfully transport the mobile ones from one lo- 
cation to another and convert back to the stationary representation — are also critical, but 
so is gate speed ("clock rate"), the parallel execution of gates, the necessity for feasible 
large-scale classical control systems and feed-forward control, and the overriding issues of 
ma nufac turing, including the reproducibility of structures that affect key tuning parame- 
ters EH. In light of these considerations, the prospects for large-scale quantum computing 
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are less certain. 

Advances in understanding what constitutes an attractive technology for a quantum 
computer are married to advances in quantum error correction. These improvements in- 
clude the theoretical thresholds below which the application of quantum error correction 
actually improves th e error r ate of the system El increases in the applicability of known 
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Among the most important, and radical, new ideas in quantum erro r correction is to polog- 
ical quantum error correction (tQEC), for example surface codes I 34 | 35 | 36 | 37 | 38 | f }j ese 
codes are attracting attention due to their high error thresholds and their minimal demands 
on interconnect geometries, but work has just begun on understanding the impact of tQEC 
on quantum computer architecture, incl uding determ ining the hardware resources neces- 
sary and the performance to be expected EDSMHEI 

The effective fault tolerance threshold in tQEC depends critically on the microarchitec- 
ture of a system, principally the set of qubits which can be regarded as direct neighbors of 
each qubit. As connectivity between qubits increases, both the operations required to ex- 
ecute error correction and the opportunities for "crosstalk" as sensitive qubits are directly 
exchanged decline, allowing the system to more closely approach theoretical limits. 

Here, we argue that even for tQEC schemes that require only nearest-neighbor quan- 
tum gates in a two-dimensional lattice geometry, communication resources will continue 
to be critical. We present an architecture sketch in which efficient quantum communica- 
tion is used to compensate for architecture inhomogenities, such as physical qubits which 
must be separated by large effective distances due to hardware constraints, but also due to 
qubits missing from the lattice due to manufacturing defects. Assuming a homogeneous 
architecture may be acceptable for small-scale systems, but in order to create a system that 
will grow to solve practical, real-world problems, distributed computation and a focus on 
the necessary communications is required. Further, our design explicitly recognizes that 
not all communications channels are identical; they vary in the fidelity of created entan- 
glement and physical and temporal resources required. This philosophy borrows heavily 
from established principles in classical computer architecture 53. Classically, satisf ying 
the demands of data communication is one of the key activities of system architects 
Our design process incorporates this philosophy. 

No computing syst em can be designed without first considering its target workload 
and performance goafaSSESl -j^e j eve j f imperfection we allow for quantum operations 
depends heavily on the application workload of the computer. Our goal is the detailed 
design (and ultimately implementation) of a large-scale system: more than ten thousand 
logical qubits capable of running 10 11 Toffoli gates within a reasonable time (days or 
at most a few months). For example, such a system could factor a 2,000-bit number using 
Shor's algorithmSS! This choice of scale affects the amount of error in quantum operations 
that we can tolerate. Steane analyzes the strength of error resilience in a system in terms 
of KQ, the product of the number of logical qubits in an application (Q) and the depth 
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(execution time, measured in Toffoli gate times) of the application (if) 021. Our goal is to 
tune the error management system of our computer to achieve a logical error per Toffoli 
gate executed of p L < 1/KQ, with KQ ~ 10 15 S3. 

Under most realistic technological assumptions, the resources required to reach ade- 
quate KQ values are huge. Nearly all proposed matter qubits are at least microns in size, 
when control hardware is included. For chip-based systems, a simple counting argument 
demonstrates that more qubits are required than will fit in a single die, or even a single 
wafer. This argument forces the implementation to adopt a distributed architecture, and so 
we require that a useful technology have the ability to entangle qubits between chips EES 

As an example architecture supporting rich communications, we are designing a device 
based on semiconductor nanophotonics, using the spin of an unpaired electron in a semi- 
conductor quantum dot as our qubit, with two-qubit interactions mediated via cavity QED. 
We plan to use tQEC to manage run-time, soft faults, and to design the architecture to be 
inherently tolerant of fabricated and grown defects in most components. 

Our overall architecture is a quantum multicomputer, a distributed-memory system 
with a large number of nodes that communicate through a multi-level interconnect. The 
distributed nature will allow the system to scale, circumventing a number of issues that 
would otherwise place severe constraints on the maximum size and speed of the system, 
hence limiting problems for which the system will be suitable. 

Within this idiom, many designs will be possible. The work we present here represents 
a solid step toward a complete design, giving a framework for moving from the overall 
multicomputer architecture toward detailed node design. We can now begin to estimate the 
actual hardware resources required, as well as establish goals (such as the necessary gate 
fidelity and memory lifetimes) for the development of the underlying technology. 

Section |2] presents background on the techniques for handling of errors in a quantum 
computer that we propose to use. Section [3] qualitatively presents our hardware building 
blocks: semiconductor quantum dots, nanophotonic cavities and waveguides, and the op- 
tical schemes for executing gates. Section [4] presents a qualitative description of the re- 
sources employed in the complete system. In particular, it describes how some quantum 
dots, used for communication, are arranged for deterministic quantum logic mediated by 
coupled cavity modes, while other quantum dots are indirectly coupled via straight, cavity- 
coupled waveguides for purification-enhanced entanglement creation. Long columns of 
these basic building blocks span the surface of a chip, and many chips are coupled together 
to create the complete multicomputer. Preliminary quantitative resource counts appear in 
section [5] 

2. Multi-level Error Management 

A computer system is subject to both soft faults and hard faults; in the quantum computing 
literature, "fault tolerance" refers to soft faults. A soft fault is an error in the operation of a 
normally reliable component. Soft faults can be further divided into errors on the quantum 
state (managed through dynamically-executed quantum error correction or purification), 
and the loss of qubit carrier (e.g., loss of a photon, ion or the electron in a quantum dot, 
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depending on the qubit technology). Qubit loss may be addressed by using erasure codes, 
or, in the case of tQEC, through special techniques for rebuilding the lattice state S3 In 
this section, we introduce our approach to managing these multiple levels of errors, which 
will be further developed in the following sections. 



2.1. Defect Tolerance and Quantum Communication 

Hard faults are either manufactured or "grown" defects (devices that stop working dur- 
ing the operational lifetime of the system). With adequate hardware connectivity, flexible 
software-based assignment of roles to qubits will add hard fault tolerance, allowing the 
system to deal with both manufactured and grown defects. 

The percentage of devices that work properly is called the yield. In our system, most 
of the components are expected to have high yields, but the quantum dots themselves will 
likely have low yields, at least in initial fabrication runs and possibly in ultimate devices. 
These faults occur in part due to the difficulty of growing optically active quantum dots 
in prescribed locations, but more due to the difficulty of assuring each dot is appropriately 
charged and tuned near the optical wavelength of the surrounding nanophotonic hardware, 
to be further discussed in Sec. 13.31 

The presence of hard faults means that the connectivity of the quantum computer begins 
in a random configuration, which we can determine by device testing. As a result, the 
architecture will have an inhomogeneous combination of high-fidelity connections where 
pairs of neighboring qubits are good and low-fidelity connections between more distant 
qubits. To compensate for the low-fidelity connections, we choose to use entanglement 
purification to bring long-distance entangled-states up to the fidelity we desire for building 
our complete tQEC lattice. This choice means t hat the sy stem will naturally use many of 
the techniques developed for quantum repeaters USEES and portions of the system will 
require similar computation and communication resources, used in a continuous fashion. 
Details of these procedures are presented in Sec. [4] 



2.2. Topological Fault Tolerance 

On top of purified states, we employ topological error correction (tQEC), 
| 34|35|36 |3_ 7J ■ part icular the two-dimensional scheme introduced by Raussendorf and 
HarringtoiJ22E3l52] j n m j s scn eme, the action of the quantum computer is the sequen- 
tial generation and detection of a cluster state, and error correction proceeds by checking 
against expected quantum correlations for that state. Logical qubits are defined by deliber- 
ately altering these correlations at a pair of boundaries in an effectively three-dimensional 
lattice of physical qubits. These boundaries may be the extremities of the lattice or holefl 
of various shapes "cut" into the lattice by choosing not to entangle some qubits. The qubits 



a These holes are commonly called "defects" in the topological computing literature, as they are similar to de- 
fects in a crystal; in this paper, we reserve the term "defect" for a qubit that does not function properly, i.e. a 
manufacturing defect. 
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in the interior of the lattice have their state tightly constrained, whereas pairs of boundaries 
are associated with a degree of freedom that is used as the logical qubit. 

The simplicity of the gate sequences used to constrain the qubits in the lattice interior 
and the independence of these gate sequences on the size of the system are directly respon- 
sible for tQEC's high threshold error r ate of approximately 0.8% for preparation, gate, 
storage and measurement errors I 35 | 53 |^ t jj e high^ threshold found to date for a system 
with only nearest neighbor interactions. 

In 2-D, we choose to make holes that are squares of side length d. Logical operators 
take the form of rings and chains of single-qubit operators — chains connect pairs of 
holes, rings encircle one of the holes. If we associate Xl with chains and Zl with rings 
(or vice versa), it can be seen that these operators will always intersect an odd number of 
times ensuring anticommutation. Braiding holes around one another can implement logical 
CNOT, as shown in Figure [T] 



a.) b.) c.) 




Fig. 1. Logical qubits in topologically error-corrected systems are represented by unentangled "holes" in a high- 
entangled cluster state on a lattice. The lattice itself is not shown; the squares represent the holes, a.) A single 
logical qubit is associated with two holes. Logical operators are rings and chains of single qubit operators, b.) 
Moving holes around one another by changing the error correction circuits on the boundary of holes results in the 
deformation and ultimately braiding of logical operators, c.) Equivalent form of the braided logical operators after 
pinching together sections, and thus cancelling these sections, to form disjoint rings and chains. The mapping of 
logical operators represents logical CNOT with the left logical qubit as control. 

tQEC offers important architectural advantages over other error-suppression schemes, 
such as concatenated codes. Most importantly, unlike tQEC, many concatenated codes lose 
much of their effectiveness when long-distance gates are precluded by the underlying tech- 
nology. In addition, the amount of error correction applied in tQEC can be controlled more 
finely than with concatenated codes, which have a property that every time an additional 
level of error correction is used, the number of physical qubits grows by at least an order 
of magnitude. tQEC's error-protection strength, in contrast, improves incrementally with 
each additional row and column added to the lattice. 

Logical errors are exponentially suppressed by increasing the circumference and sepa- 
ration of holes. This can be inferred directly from FigureQ] — the number of physical qubit 
errors required to form an unwanted logical operation grows linearly with circumference 
and separation. The threshold error rate p t h is defined to be the error rate at which increas- 
ing the resources devoted to error correction neither increases nor decreases the logical 
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error — the error rate at which the errors corrected are balanced by the errors introduced 
by the error correction circuitry. Assuming a hole circumference and separation of 4d, for 
physical error rates p < p t h, error suppression of order 0((p/pth) ad ) will be observed. 
The factor a depends on the details of the error correction circuits. Assuming the error cor- 
rection circuits do not copy single errors to multiple locations, a ~ 2 as a circumference 
of Ad implies that a chain of approximately 2d errors can occur before our error correction 
system will mis-correct the state and give a logic al error. 

Related tQEC schemes exist in 3-D and 2-D' JJ ' J ' J ' JO \ The 3-D scheme makes use 
of a 3-D cluster state and the measurement-based approach to computing — all qubits are 
measured in various bases, and measurement results processed to determine both the bases 
of future measurements and the final result of the computation. This approach is well- 
suited to a technology with short-lived qubits (e.g., photons, which are easily lost) or slow 
measurement. The 2-D scheme requires a 2-D square lattice of qubits that are not easily 
lost plus fast measurement. Given these two properties, the threshold is slightly higher than 
the 3-D case and certain operations, such as logical measurement, can be performed more 
quickly. Barring these minor caveats, the 2-D scheme is a simulation of the 3-D scheme, 
in which one dimension of the 3-D lattice becomes time. 



2.3. Logical Gates in Topological Error-Corrected Systems 

When making use of topological error correction, only a small number of single logical 
qubit gates are possible — namely Xl, Zl and logical initialization and measurement in 
these bases. Logical initialization and measurement in the Xl and Zl bases can be im- 
plemented using initialization and measurement of regions of single qubits encompassing 
the defects in the X and Z bases. The only possible multiple logical qubit gate, logical 
CNOT, can be implemented by braiding the correct type of defects in a prescribed manner 
as shown in FigureQ] This set of gates is not universal. 

To achieve universality, rotations by tt/2 and tt/A around the Xl and Zl axes can be 
added to the logical gate set. These gates, however, require the use of specially-prepared 
S states where \S) = |0) + e i9 |l), 9 = n/2, ir/A. Fault-tolerant creation of the S states 
involves use of the concatenated decoding circuits for the 7-qubit Steane code and 15-qubit 
Reed-Muller code respectively to distill a set of low-fidelity S states into a single higher- 
fidelity one. Convergence is rapid — if the input states have average probability of error p, 
the output states will have error probabilities of 7p 3 and 35p 3 respectivelyESl. 

This implies that for most input error rates, two levels of concatenation will be more 
than sufficient. Nevertheless, this still represents a large number of logical qubits, implying 
the need for S factories throughout the computer and the dedication of most of the qubits 
in the computer to generate the necessary S states at a sufficient rate. This will impact the 
resource counting for our target application, as we discuss in Section[5] 

When using an S state, the actual gate applied will be a random rotation by either +9 
or —9. Error corrected logical measurement must be used to determine which gate was 
applied and hence whether a corrective 29 gate also needs to be applied. If 29 = ir/2, the 
correction must be applied before further gates are applied, introducing a temporal gate 
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ordering. This time ordering prevents arbitrary quantum circuits involving non-Clifford 
group gates being implemented in constant time. 

3. Hardware Elements 

In considering the harware in which to implement this architecture, by far the most im- 
portant pending question is the choice of quantum dot type, which will also determine the 
semiconductor substrate and operational wavelengths. 

3.1. Quantum Dots 

The best type of quantum dot to employ remains an open question. Charged, self-assembled 
InGaAs quantum dots in GaAs are appealing due to their high oscillator strength and 
near-IR wavelength. These dots have been engineered into cavities in the strong coupling 
regime E31 and recent experiments have demo nstrated complete ultrafast optical control 
of a single electron spin qubit trapped in the doJSEH] 

However, it is challenging to make 
high-yield CQED devices from these dots due to their high inhomogeneous broadening and 
the chal lenges of site selectivity, although progress continues in designing tunable quan- 



however, may require a more homogeneous kind of quantum dot, such as those defined by 
a single donor impurity and its associated donor-bound-exciton state. Donor-bound exci- 
tons in high quality silicon and GaAs are remarkably homogeneous, both in their optical 
transitions and in the Larmor frequencies of the bound spin providing the qubit. However, 
the isolation of single donors in these systems has been challenging. Donor impurities in 
silicon would seem almost ideal, since isotopic purification can give long spin coherence 
time J^2l and extremely homogeneous optical transitionJ^H, but optical control in this sys- 
tem is hindered by silicon's indirect band-gap. A II- VI semiconductor such as ZnSe may 
provide a nearly ideal compromise - single fluorine impurities in ZnSe have been iso- 
lated, shown to have a comparable oscillator strength to quantum dots, and incorporated 
into microcavified^Zl Recently, sufficient homo gene ity has been available to observe inter- 
ference from photons from independent devices^. However, this system comes with its 
own challenges, s uch as the less convenient blue emission wavelength. Nitrogen- Vacancy 
centers in diamoncPSESEH j^ve also attracted heavy attention recently, but the diamond 
substrate remains a challenging one for implementing the nanophotonic hardware that sup- 
ports the quantum computer. 

Regardless of the type of quantum dot, there are several common physical features 
which are to be employed for quantum information processing. The dot has a two-level 
ground state, provided by the spin of trapped electrons in a global applied magnetic field. 
This spin provides the physical qubit. The dot also has several optical excited states formed 
from the addition of an exciton to the dot. One of these excited states forms an optical A- 
system with the two ground states, allowing not only single qubit control via stimulated 
Raman transitions^] but also selective optical phase shif ts of dispersive ligh^S (to be 
discussed in Sec. 13.31) or state-selective scattering^GQGI] These enable several possible 
means to achieve entanglement mediated by photons. 




Sufficient homogeneity for a scalable system, 
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3.2. Nanophotonics 

The quantum dots will be incorporated in small cavities to enhance their interaction with 
weak optical fields. Cavities may be made from a variety of technologies, including pho- 
tonic crystal defects and microdisks. Here, we will focus on suspended microdisk cavities. 

The small microdisks are in turn coupled to larger waveguides arranged as disks, rings, 
or straight ridges, which carry qubit-to-qubit communication signals. These waveguides 
can be ridges topographically raised above the chip surface, or line-defects in photonic 
crystals. Our present focus is on ridge-type waveguides. Waveguides are well-advanced 
and relatively low-loss, although it is best to make the waveguides as straight as possible, 
and to avoid crossing two waveguides in the floor plan. Silicon at telecom wavelengths, for 
example, makes a good waveguide for our purposes, as it is almost transparent to 1.5 /im 

light, with a loss of about 0. ldB/cm. The coherent processing of single photons in on-chip 

l79l 

waveguides has recently been well demonstrated for ridge-type silica waveguide*^. 

The "no crossing waveguides" restriction is one of the two key issues driving device 
layout. The other is the need to route signals to more than one possible destination, for 
which high-speed, low-loss optical switching is required. Good optical switches are dif- 
ficult to build: many designs have poor transmission of the desired signals and poor ex- 
tinction of the undesired ones, and tend to be large and slow. In our architecture, we focus 
on microdisk-type or microring-type add/drop filters. In suspended silica systems, these 
switches have been shown to have insertion losses as low as 0.001 dB for the "bus" when 
the microdisk is off -resonant; optical loss from the bus to the drop port can be as low as 0.3 
dB when the system is resonanl^S On-chip switches in semiconductor platforms do not 
typically feature such nearly ideal behavior but continue to improve. For example, 40 /im 
by 12 [im multi-ring add-drop switches with a loss of a few dB were recently demonstrated 
in a silicon platform EH. 

We need to individually control the resonance of every optical microdisk in the circuit; 
these microdisks provide the add/drop switches and qubit-hosting cavities. Ultimately, it is 
the ability to rapidly move these microdisk resonators into and out of near-resonance with 
the waveguided control light that provides the quantum networking capability. A candi- 
date method for this is to employ the optical nonlinearity of the semiconductor substrate. 
A strong, below-gap laser beam focused from above onto one of the cavities will shift its 
index of refraction through a combination of heating, carrier creation, and intrinsic opti- 
cal nonlinearities &\ The laser pulses for this may be carried through free space from a 
1751 

micromirror array u -= u . 

To complete the architecture, we will also need mode-locked lasers for single-qubit 
control, modulated CW-lasers for quantum non-demolition (QND) measurements as well 
as deterministic and heralded entanglement gates, and photodiodes to measure the intensity 
of the control light. Lasers and photodiodes are expensive in both space and manufactur- 
ing cost, so an ideal system will be carefully engineered to minimize the number required. 
Mode-locked lasers with repetition frequency tuned to the Larmor frequency of spin qubits 
will be used for fast single-qubit rotations ^1. These lasers may be directed by the same 
micromirror used for switching. More slowly modulated single-frequency lasers will be 
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used for qubit initialization, measurement, and entanglement operations. These lasers may 
be incorporated into the chip, or injected via a variety of coupling technologies. The pho- 
todiodes are intended to measure intensity of pulses with thousands to millions of photons, 
rather than single-photon counting, which allows the possibility of fast, on-chip, cavity- 
enhanced photodiodes; however, off-chip detectors may be more practical depending on 
the semiconductor employed. 

These resources are crucial, as they are needed for every single-qubit measurement 
and heralded entangling operation. These operations dominate the operation of a cluster- 
state-based quantum computer. However, these same technologies are evolving rapidly for 
classical optoelectronic interconnects, and are expected to continue to improve in coming 
years. 

3.3. Executing Physical Gates 

Four types of physical gates are employed in this architecture. 

The first type of gate is arbitrary single qubit rotations, which may be performed effi- 
ciently using picosecond pulses from a semiconducto r mode -locked laser with pulse repeti- 
tion frequency tuned to the qubit's Larmor frequency^ES] a cavity is not needed for this 
operation, and the pulses used are sufficiently far detuned from the qubit and the cavity res- 
onance that the cavity plays little role. The phase and angle of each rotation is determined 
via switching pulses through fixed delay routes, as described in Ref. |67] The performance 
of this gate is limited by spurious excitations created in the vicinity of the quantum dot by 
the pulse^Sand not by optical loss or other architectural considerations. 

The next type of gate is the quantum-non-demolition QND measurement of a single 
qubit. This gate is critical, since the initialization and measurement of every qubit is very 
frequent in our tQEC architecture, and the QND gate allows both. A QND measurement 
makes use of the optical microcavity containing the dot, and operates with the cavity well 
detuned from the dot's optical transitions. In such a configuration, an optical transition to 
one qubit ground state may present a different effective index of refraction for a cavity 
mode than the optical transition to the other qubit ground state. This results in a qubit- 
dependent optical phase shift of a slow optical pulse coupled in and out of the waveguide. 
This optical pulse may then be mixed with an unshifted pulse from the same laser to ac- 
complish a homodyne measurement of the phase shift. In one variation of this scheme, this 
phase is detected as a change in the polarization direction of a linearly polarized optical 
probe beam; this has been demonstrated for quantum dots both wiffP3 an d withouPHl a 

microcavity; larger phas e shifts have also been observed in neutral dots in improved pho- 
1701 

tonic crystal cavities-^. Simulations indicate that pulses with a timescale of about 100 ps 
may be used for this gate^SE] 

These first two gate types are single-qubit gates. For generating entanglement between 
distant qubits, two further gates are employed: a deterministic, nearest-neighbor gate, and 
a non-deterministic gate for heralded entanglement generation for distant qubits. 

The deterministic, nearest-neighbor gate will be mediated by a common microdisk 
mode connecting the cavities joining nearby qubits. The phase or amplitude of this cavity 
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mode may be altered by the state of the qubits with which it interacts, which in turn changes 
the phase or population of those qubits. The gate is achieved by driving the coupled cavity 
mode with one or more appropriately modulated optical pulses from a CW laser. The light 
is allowed to leak out of the cavity and may then be discarded. The amplitude version of 
such a gate was proposed in 1999 by Imamoglu et alP^S and may be viewed as a pair of 
stimulated Raman transitions for two qubits driven by two CW lasers and their common 
cavity mode. This gate is known to require high-Q cavities. The phase version of this 
gate, described in Ref. [ST] is an adaptation of the "qubus" gates proposed by Spiller et al. 
in 2006^2 more detailed design and simulation of this gate in the present context is in 



progress^]. 

If such deterministic gates are available, one may naturally ask whether a fully two- 
dimensional architecture of coupled qubits is more viable than the communication-based 

architecture we present here. Indeed, if truly reliable cavity QED systems can be devel- 

lR4l 

oped in the large-scale, deterministic photonic-based gates 2 ^ may enable highly promis- 

IOCI 

ing single-photon-based architectures for tQEO 2 ^. However, the devices that will enable 
deterministic CQED gates in solid-state systems are unlikely to be fully reliable. 

In particular, high-fidelity deterministic gates require extremely low optical loss be- 
tween qubits, and therefore cannot easily survive coupling to straight waveguides or to 
other elements in the photonic circuit such as switches and fibers. For generating entan- 
glement through these elements, stochastic but heralded entanglement schemes are used, 
similar to gates in linear optics except with physical quantum memory. Combined with lo- 
cal single-qubit rotations, QND measurements, and deterministic nearest-neighbor gates, 
this heralded entanglement allows quantum teleportation. Heralded entanglement is the 
bottleneck resource in quantum wiring. Heralded entanglement gates come in several fla- 
vors, but fortunately each type requires the same basic qubit and cavity resource; they vary 
in the strength of the optical field used and the method of optical detection. Which type to 
employ depends on the amount of loss between the qubits to be entangled. 

For qubits with relatively low loss between them, such as those coupled to a common 

wavegui de with out traversing to the drop port of a switch, so-called "hybrid" schemes are 
attractiv( J86|68] 

In these schemes, the QND measurement discussed above is extended to 
two qubits, distinguishing odd-parity qubit subspaces from even-parity states. For some 
detection schemes, such as x-homodyne detection, this parity gate may be deterministic, 
up to single-qubit operations which depend on measurement results | Q | 00 | „ If such parity 
gates are available, "repeat-until-success" schemes for quantum computation are very at- 
tractive^] and have been proposed for use in multicomputer-like distributed systems I^Sl. 
However, if wea k COED nonlinearities are employed with lossy waveguides, these detec- 
tion schemes fail | 86 | 68 | j n tn j s casCi p-homodyne detection may still show strong perfor- 
mance, but the parity gate is incomplete. The heralded measurement of an odd-parity state 
may project qubits into an entangled state with probability ~ 50%, but when this fails 
no entanglement is present. As in schemes using linear optics, this allows probabilistic 
quantum logic. With the addition of an extra ancilla qubit, this partial parity-gate may be 
combined into a probabilistic CNOT gate for entanglement purification. 

This scheme is attractive due to its use of relatively bright laser light and near ideal 
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probability of successful heralding. However, it is strongly subject to loss, as has been 
discussed previously 123. More complex measurement schemes may improve the fidelity 
of such gates at the expense of their probability of heralding a success ^11 For very lossy 
connections, the number of photons in the optical pulse might be reduc ed to an a verage 
of less than one photon, in which case single-photon scattering schemes I 69 | 70 | 7l | would 
be employed. These schemes succeed much more infrequently, as they rely on the click of 
a single photon detector projecting the combined qubit/photon system into one where no 
photons were lost, a possibility whose probability decreases with loss. Here, we consider 
only many -photon qubus gates using homodyne detection as discussed in Ref.|68j we com- 
pensate for different connections with different loss rates only by changing the intensity 
of the optical pulses employed, whose optimum varies with loss. The detection scheme 
remains constant across the architecture. 

Although proposals for nonlocal, deterministic gates exist, their performance is always 
hindered by optical loss. This is an inevitability: if photons are mediating information 
between qubits, the loss of those photons into the environment inevitably reveals some in- 
formation about the quantum states of the qubits, causing decoherence. A well-designed 
photon-mediated architecture should use a hierarchy of photon-mediation schemes to pro- 
vide high-success-probability gates at low distances and highly loss-tolerant gates at higher 
distances, and the qubus mechanisms allow some degree of hierarchical tuning without 
adding extra physical resources. 

In the present discussion, we discuss performance entirely in terms of optical loss. 
Photons may be lost in waveguides, from cavities, from the cavity-waveguide interfaces, 
and from spontaneous emission. An approximation of the amount of decoherence-causing 
loss at a quantum-dot-loaded cavity and cavity/waveguide interface, when running hy- 
brid CQED-based gates optimally, is the inverse of the cooperativity factor C ^5 This 
factor arises from the ratio of spontaneous emission into a cavity mode (assumed to be 
overcoupled to the waveguide) to spontaneous emission into other modes. It scales as the 
quality factor of the cavity divided by its mode volume, so the cavities containing qubits 
are designed small to maximize this factor. When we discuss qubit-to-qubit optical loss, 
this loss should be considered as the linear loss in the waveguide connecting the qubits 
plus about C _1 . Cooperativity factors between self-assembled quantum dots and th e whis- 
pering gallery modes of suspended microdisks have been shown to approach 100 1 92 | 93 ] 
corresponding to a cavity-induced loss limit of 0.04 dB. 

4. Architecture: Layout and Operational Basics 

In this section, we qualitatively describe our architecture and its operation. Many of the 
design decisions described here will be justified numerically in Section|5] 

4.1. Architecture Axes 

The basic structural element of our system is one-dimensional: a waveguide with a tangent 
series of microdisks, each connected to one or more smaller microdisks containing quan- 
tum dots, as in Fig. [2] The shared bus nature of a single waveguide offers the advantage 
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that the qubit at one end can communicate quickly and easily with the qubit at the other 
end; this long-distance interaction has the potential to accelerate some algorithms and aids 
in defect tolerance, as we will show below. However, that shared nature makes the bus 
itself a performance bottleneck in the system, as contention for access to the bus and the 
measurement device forces some actions to be postponed!^! 

This limitation on concurrent operation makes it natural to consider using multiple 
columns. Columns are connected by teleportation, aided by heralded entanglement and 
purification. The resulting structure, developed in Figures |2]to|5] is a set of many columns, 
defined by long, vertical waveguides, interspersed with smaller, circular and oval waveg- 
uides, and qubits in cavities tangential to the waveguides. The vertical waveguides are of 
two types: logic waveguides, which are used to execute operations between qubits within 
one column, and teleportation waveguides, which are used to create and purify connec- 
tions between columns within a single chip or between chips. The small, colored circles 
represent the smallest microcavities containing quantum-dot qubits. The different colors 
represent different roles for particular qubits, which we describe in Section [4721 The tele- 
portation columns do not use the smaller, higher-Q circular waveguides to couple qubits 
deterministically. Instead, as in Figures [3] and |4] they use larger racetrack-shaped waveg- 
uides that can support a larger number of qubits which are only stochastically entangled, 
called transceiver qubits. The qubits along one racetrack can be used to purify ancilla 
qubits, allowing us to connect qubits in potentially distant parts of the chip, or to connect 
to off-chip resources. 

The architecture in Fig.|5]is designed to minimize both the length of waveguides and the 
number of switches traversed by pulses carrying quantum information. Note that signals 
introduced onto the waveguide snaking through the chip will not be perfectly switched 
into the detectors, implying some accumulated noise; however, this effect can be mitigated 
with appropriate detector time binning and sufficiently large microdisk Q-factors in the 
switches. 

A single node has two axes of growth. The length of a logical waveguide column and 
the number of columns provide the basic rectangular layout, which will have some flexibil- 
ity but is ultimately limited by the size of chip that can be practically fabricated, packaged 
and used. To give a concrete example, if we set the vertical spacing of the red lattice qubits 
to 50 /im and the column-to-column spacing to 100 /im, 100 qubits in each vertical column 
and 100 columns will result in the active area of the chip being 5 mm by 10 mm. 

A third axis of growth is the number of chips that are connected into the overall system 
- the number of nodes in our multicomputer. In previous work, we have been concerned 
with the topology and richness of the interconnection network between the nodes of a 
multi comput er using CSS codes, finding that a linear network is adequate for many pur- 
poses USES] -pjjg ex t e nsion of nodes into the serpentine teleportation waveguide in Fig. [5] 
enables such a linear-network multicomputer, although the additional necessary resources 
for bridging lossier chip-to-chip connections will not be considered here. 

The structures in our architecture are large by modern VLSI standards; the principle 
fabrication difficulty is accurate creation of the gap between the cavities and the waveg- 
uides. That spacing must be 10-100nm, depending on the microdisk and waveguide size 
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and quality factors 123. The roughness of the cavity edge is a key fabrication characteristic 
that determines the quality of the cavity, and ultimately the success of our device. 

Although the device architecture and quantum dot technology are not yet fixed, we 
include images of test-devices fabricated using e-beam lithography following the method- 
ology described in Ref.[93] only to help visualize future devices. Figures [2] and [3] include 
scanning electron microscope images of a device created in a GaAs wafer containing a 
layer of self-assembled InAs quantum dots 1^21 More scalable fabrication techniques than 
e-beam lithography must ultimately be developed for scalability; promi sing route s include 
nanoimprint lithography 123 and deep sub-wavelength photolithographylSlESBS] 



4.2. Qubit Roles and Basic Circuits 

The different colors for the qubit quantum dots in Figure[3]represent different roles within 
the system. Physically, the cavities are identical, but they are coupled to different waveg- 
uides, allowing them to interact directly with different sets of qubits. Within those connec- 
tivity constraints, their roles are software-defined and flexible. Finding the correct hardware 
balance among the separate roles is a key engineering problem. The answer will depend 
on many parameters of the physical system, including the losses in switches and couplers, 
and will no doubt change with each successive technological generation. 

The red qubits in the figures, in the column vertically placed between the larger circles, 
are the lattice qubits. Those that are functional are assigned an effective (x, y) position in 
the 2-D lattice used to implement tQEC. These are subsequently divided into code qubits, 
which are never directly measured, and syndrome qubits, which are regularly measured fol- 
lowing connections to code qubits in order to maintain the topologically protected surface 
code. The ideal number and density of syndrome qubits among code qubits depends on the 
yield. Within a column, all functional nearest neighbor pairs of qubits can be coupled in 
parallel. Non-nearest-neighbor couplings can only occur sequentially. For very low yields, 
in which code qubits rarely have nearest-neighbor couplings, only a few syndrome qubits 
per column are required as the syndrome circuits must largely be implemented sequentially, 
implying the syndrome qubits can be reused. 

The blue qubits, or transceiver qubits, are aligned with the racetracks and the long pu- 
rification waveguides. These qubits are used to create Bell pairs between column groups 
within the same device, or between devices. Because purification is a very resource- 
intensive process, the transceiver qubits are numerically the dominant type. 

The green qubits, sandwiched between the column of circles and the column of race- 
tracks, are ancilla qubits, used to deterministically connect stochastically created entangled 
states among (blue) transceiver qubits to (red) lattice qubits. The green qubits also play an 
auxiliary role during the purification of the blue qubits. 

The circuit, or program, for executing purification on the blue qubits is shown in Fig- 
ure [3] The blue qubits have previously been measured and are thus initialized to a known 
state. Then, qubits in a given teleportation column of Figure [5] are entangled with qubits 
in either the same column or the one neighbouring it to the right using the heralded entan- 
glement generation technique discussed in Sec. 13.31 Note that waveguide loss prevents the 
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efficient entangling of qubits in widely separated teleportation columns. In general, a laser 
pulse is inserted in the teleportation waveguide at a given column, coupled with a qubit in 
that column, coupled with a second qubit either in that column or the one neighbouring it to 
its right and then switched out of the teleportation waveguide and measured. This process 
is repeated in rapid succession, building a pool of low-fidelity entangled pairs, creating the 
|^ + ) states at the left edge of Figure[3] 



Cavity containing quantum dot 
for lattice qubit 




(a) (b) 

Fig. 2. (a) Layout and pulse path for executing a local, high-fidelity controlled-Z gate. An optical pulse couples 
from the straight waveguide to the microdisk waveguide; the two qubits of interest are introduced to the logic 
gate by bringing their cavities into resonance with the optical pulse, (b) Scanning electron micrograph of a non- 
functional demonstration device, fabricated in GaAs with (unshown) InAs quantum dot layer. The structures are 
underetched following the methods presented in Ref. 93. 

Once the base-level entangled pairs are created, the circuit in Figure [3] is exe- 
cuted within each column, which employs two probabilistic parity gates to achieve the 
controlled-NOT operations used in entanglement purification. Purification proceeds until 
entangled state fidelities are considered sufficient for computation. At that time the puri- 
fied entanglement between blue transceiver qubits is used to make an appropriate entangled 
(green) ancilla which are connected to the target lattice qubits. 

Finally, the high-fidelity Bell pairs are used to create the tQEC lattice, using the clus- 
tering circuit shown in Fig. |4] 

4.3. Lattice 

The most important issue in the generation of a cluster state in our geometry is the physical 
asymmetry between connections within a column, those with other columns, and those 
between dies. The hierarchy of connection distances in our system will be characterized in 
terms of the number of laser pulses and measurements required to achieve entanglement of 
a particular fidelity. 

Entangling two qubits connected to the same circular waveguide is straightforward; we 
can refer to these as "cavity connected" or "C-connected." Racetracks are a longer, and 
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(a) (b) (c) 

Fig. 3. (a) Partial circuit for executing purification on long-distance Bell pairs. The diamonds represent a proba- 
bilistic parity gate which projects two qubits into an odd-parity subspace with probability of approximately 50%. 
These gates are achieved via pulses routed through the racetrack waveguides via the ring-waveguide labelled 
"switch". All measurements are in the X basis, (b) The basic layout unit is a column of racetrack and circular 
waveguides sandwiched between the straight purification and logic waveguides, (c) Zoom-out of the same device 
shown in Fig. Ob). 

slightly lower-fidelity, form of cavity; we refer to two ancillae or two transceiver qubits on 
the same racetrack as "R-connected", or racetrack-connected. Two lattice qubits connected 
through an R-connected Bell pair are said to be indirectly connected, or "I-connected". 

Within a logic column, many deterministic gates on C-connected qubits can be per- 
formed without purification, and a high level of parallelism may be employed. The pulses 
that execute deterministic gates on the logic waveguide couple into the cavities only 
weakly, and do not need to be measured after the gate, making it possible that the same 
strong pulse could be used to execute several gates concurrently. If we label the qubits with 
the pattern ABABA..., we may be able to couple all of the AB pairs in one entangling 
time slot, then couple all of the BA pairs in the second time slot. 

The fidelity of W connections is dominated by the efficiency of coupling pulses into and 
out of cavities, as the loss in the waveguide will be negligible. When connecting two lattice 
qubits in columns separated by a purification waveguide, we require moderate amounts of 
purification. The purification ancillae are themselves W-connected; the post-purification 
lattice connection we refer to as "JV-c;onnected". 

Finally, qubits that do not share the same purification waveguide must be connected 
using a pulse that transits one or more switches. We refer to these physical connections 
as X or Xij connections, where i is the number of switches and j is the number of I/O 
ports that must be transited. Lattice qubits connected after purification we refer to as Px- 
connected. 

The Pw -connections and P\ -connections will be most strongly subject to bottlenecks 
from the limited number of laser pulses and detection events in our architecture, and are 
therefore the focus of our numeric studies in the next section. 
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(a) 



(b) 



Fig. 4. (a) Partial circuit and (b) qubit/cavity layout and pulse path for executing long-distance clustering op- 
erations. This circuit and a matching one elsewhere in the system execute the logical controlled-Z gate between 
two lattice (red) qubits in a teleported fashion (which we call telegate) by using a high-fidelity Bell pair built on 
transceiver (blue) qubits. The four qubits used in this circuit are highlighted in the layout. The second transceiver 
qubit and the ancilla (green) are used as ancillae in this circuit. The diamonds represent probabilistic (P fa 50%) 
parity gates on the racetrack-shaped waveguide, between either the two transceiver qubits or the transceiver and 
the ancilla. The gate in the dashed-line box in (a) is executed by enabling the two qubits in the box in (b). All 
measurements are in the X basis. The physical CZ gate in the top row is performed using the circuit of FigurefS] 



5. Resource Estimates 

Given a set of technological constraints (pulse rate, error rate, qubit size, maximum die 
size), a complete architecture will balance a set of tradeoffs to find a sweet spot that effi- 
ciently meets the system requirements (application performance, success probability, cost). 
Minimizing lattice refresh time is the key to both application-level performance and fault 
tolerance, but demands increased parallelism (hence cost); in our system, this favors a very 
wide, shallow lattice, which is more difficult to use effectively at the application level. In- 
creasing the number of application qubits increases the parallelism of many applications 
(including the modular exponentiation that is the bottleneck for Shor's algorithm), but if 
the space dedicated to the singular factory does not increase proportionally, performance 
will not improve. 

We begin by describing the communication costs and the impact of loss on the lattice 
refresh cycle time in a generic 2-D multicomputer layout, from which we can calculate the 
effective logical clock cycle time for executing gates on application qubits. With these con- 
cepts in hand, we then propose an architecture, and calculate its prospective performance. 
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W connection 
R connection 
I connection 



Repeated 
Core Lattice 
Region 



,C connection 
Px connection 
X connection 
Pw connection 



Fig. 5. The nanophotonic quantum multicomputer architecture. Small microdisks containing lattice, ancilla, 
and transceiver qubits are color-coded while waveguides and microdisk-based add-drop switches are indicated 
by black lines. This schematic indicates the critical elements of the nanophotonic chip-layout described in the 
text, but the structures shown are not to-scale. In particular, the modulated CW lasers and detectors shown are 
the largest elements and are likely to be off-chip. The pink squares indicate the location of beam-splitters defined 
by evanescently coupled ridge- waveguides, which split a single laser pulse (indicated by a blue line) into probe 
(red line) and local oscillator (LO, green line) optical pulses. These pulses travel two paths; one is buffered by 
a serpentine waveguide which delays the probe by several times the pulse width of approximately 100 ps. (The 
pulse colors are schematic only; these pulses are to be monochromatic.) The probe is switched to follow the LO 
along the same route through the teleportation waveguides of the core chip, which depend on the qubits to be 
coupled. Single passes from top-to-bottom, such as the one shown by the red and green lines, enable the similar 
"W connections" and "Pw connections" between qubits as shown on the right. A U-shaped path (not-shown) 
would enable the longer-distance "X" and "Px" connections. Lasers directly coupled into waveguides enable 
C connections and mediate logic within the circular microdisks connecting lattice qubits to ancilla qubits. The 
rectangular region in the center is repeated many times vertically and horizontally. 



5.1. Communications and Lattice Refresh 

Figure [6] shows the residual infidelity and the cost in teleportation waveguide pulses 
as a function of the loss in the probe beam from qubit to qubit through the waveguides. 
Purification is performed using only Bell pairs of symmetric fidelities, and is run until final 
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Fig. 6. For qubus connections, impact of signal loss on the final fidelity achievable using symmetric purification. 
EiTor bars represent the RMS of the number of pulses, which is close to the average number; the distribution is 
strongly Poisson-like. 



fidelity saturates or until fidelity is better than 99.5%. The two curves represent two values 
of round-trip loss in the racetrack waveguides used for local parity gates; with local loss of 
0.2%, we cannot achieve a final fidelity above the threshold for tQEC. Thus, we establish 
an engineering goal of 0.02% loss or better. 

The values in Fig. [6] are calcul ated by generating a Markov probability matrix for the 
protocol of symmetric purificatiorP^I where each matrix transition requires the gener- 
ation and detection of an optical pulse in the teleportation waveguide. Probabilities and 
fidelities for each step are found using the formalism presented in Ref. [68] Many of these 
transitions are deterministic, but some are not due to the probability of parity gates failing 
or the purification protocol failing. Exponentiation of this matrix allows the direct calcu- 
lation of the probability of completing the protocol in a given number of steps, allowing 
calculation of the probability density function for completion of purification vs. number 
of optical pulses. These probability distributions are strongly Poissonian. They are used to 
calculate the average and root-mean-square number of pulses plotted in Fig. [6] 

This Markov analysis is useful for estimating performance, but overestimates the re- 
quired spatial and temporal resources considerably. The strictly symmetric purification 
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routine assumed here makes less than ideal use of qubit memory; alternative resource man- 
agement strategies can lead to order-of-magnitude improvements in speed without a com- 
parable increase in size, as considered, for example, in Ref. 30 Also, the calculation we 
have performed assumes that when parity gates fail in the circuit shown in Fig. 0a), the 
entire procedure fails and entangled pairs must be regenerated and repurified. In fact, if one 
parity gate succeeds and the other fails, then one Bell pair preserves some of its entangle- 
ment and may be kept, possibly with a Pauli correction, for subsequent purification rounds. 
Optimizing the purification procedure to account for such possibilities is difficult to do an- 
alytically; Monte Carlo simulations such as those in Ref. [30] may estimate the worth of 
these strategies, but we leave such simulations for future work. 

With the proper layout, we can connect multiple chips into a two-dimensional structure. 
With V rows of H chips each, and a chip that consists of C columns each containing R 
rows of lattice qubits, we have a physical structure capable of supporting an HC x VR 
lattice. In such a multicomputer, entangling pulses may be destined for another qubit in the 
same column in the same chip, another qubit in the same column but the chip below, or in 
the neighboring column to the left or right. With multiple possible destinations, switching 
is naturally required; we can arrange the switching so that vertical connections are X\ t ± 
connections and horizontal ones are X% \ connections. Assessing the scalability of such 
a system and establishing guidelines for configuring the system depend on understanding 
these connections. 

TableQ]lists the costs for the lattice building operations on such a switched multicom- 
puter architecture. We compare two logical lattices, a direct-mapped HC x VR logical 
lattice and a sub-lattice-organized HCs x VR/s logical lattice in which each physical 
column is used as a small R/s x s lattice The physical yield affects the probability that 
two neighboring lattice qubits and their shared ancilla are good, and hence the probability 
that a C connection can be used. Additionally, for low yields (y < 0.8), we assign only a 
few qubits per column as tQEC syndrome qubits, forcing all lattice cycle operations to use 
Pw -connected gates. 

Table 1 . Number and types of connections per physical waveguide for lattice-building for an H X V 
multicomputer with C X R lattice qubits per node and HC total laser input ports and lattice sub-factor s. 
Expressions assume R mod s = 0. Rt = Ry e = Ry p (l — (1 — y p ) ), the functional number of qubits 
in a column. 



Connection type 


100% yield 


physical yield y p 


C 

Pw 

V neighbor (P X (X 1A )) 
//neighbor (P x (X 2 ,i)) 


2V(R - s) 

V{2R - R/s) 
2s{V - 1) 
VR/s 


n c = 2V(R f - s)yl (for y p > 0.8) or 
(y P < 0.8) 

n w = V(2R f -R f /s) + 2V(Rf-s)-n c 

n xl =2s(V-l) 

nx2 = VRf/s 



We observe several qualitative facts about this architecture: 

b The table assumes that R mod s = 0. Although that is not a requirement, the expressions are more complex 
for R mod s ^ 0; without careful structuring, potentially as many as half of the Pw connections may become 
P x forXi,i. 
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• The lattice cycle time is constant as H increases, but the number of lasers and 
measurement devices must increase proportionally. 

• To first order, the lattice cycle time scales linearly with VR, but second-order 
effects will likely make it worse than linear. 

• The number of X2,i connections favors a sub-lattice with a large s, but the mini- 
mum size of the logical lattice limits s; we require 14c? < VR/ s. 

• Increasing lattice cycle time hurts fidelity due to memory degradation. 

• Increasing lattice cycle time hurts application performance. 

The total lattice refresh cycle time is i; at = t pu i se pi at , where pi a t is the number of 
pulse time steps in the complete cycle. The final, logical clock rate for application gates 
depends on both the refresh cycle and the temporal extent of the lattice holes as they move 
through the system to execute logical gates. We can visualize the movement of the holes 
through the temporal dimension as "pipes" routed in a pseudo-3-D space. To maintain the 
same Ad perimeter and spacing about the hole as it extends into the temporal dimension, 
each hole movement will also have to extend for hd lattice refresh cycles. We have used 
d = 14 as the length of one side of each square hole. The temporal spacing must be 
4d = 56, implying that the fastest rate at which hole braiding can occur is 5rf = 70 lattice 
refresh cycles. 

In our architecture, the logical clock rate is fl (d 2 ). The number of refresh cycles per 
logical gate is 9(d). The refresh time itself is fl(R) = f2(d); because we must choose 
R oc d, the number of pulses grows at least linearly in d. As the columns lengthen, fidelity 
falls and the number of pulses per cycle grows, creating a positive feedback in d and cycle 
time. 

5.2. Proposed Architecture and Performance 

Table |2] summarizes our initial strawman architecture, depicted in Fig. [5] To factor an n-bit 
number using Shor's algorithm, we would like to have 6n logical qubits. Having estab- 
lished a goal of factoring a 2,048-bit number, we need 12,288 logical qubits. 

Ultimately, the execution of application algorithms in tQEC requires, as at the physical 
level, two components: communication and computation. Logical communication consists 
of routing the pipes through the pseudo-3-D lattice. These pipes can route through the 
space with only a fixed temporal extent, allowing the equivalent of "long distance" gates in 
the circuit model. They do, however, consume space in the lattice, creating a direct tradeoff 
between the physical size of the system and the time consumed. Additionally, the shape of 
the logical lattice determines how efficiently logical qubits can be placed and routed. We 
assign 25% of the logical qubit space for wiring and hole movement space. 

Computation, for many algorithms, will be dominated by Toffoli gates; as some of the 
operations are probabilistic, an average of over ten S and T stat es are required for each. 
Shor's algorithm requires some 40n 3 Toffoli gates: 5n 2 adder calls ^ ^ K aft er optimizations 
to modulo arithm etic a nd one level of indirection in the arirthmetic LH22I), each requiring 
Wn Toffoli gatesESI. The total of 40n 3 = 3.2 x 10 11 Toffoli gates require over 10 12 S 
states. Again, a direct tradeoff can be made between space and time, as the S states can be 
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Table 2. Summary of our proposed serpentine, add-drop filter architecture. M= 2 ~ 10 . 



System Hardware 




Chip lattice, C X R 


128 x 770 


Multicomputer setup, H x V 


DOOOO X 1 


Physical lattice size (in qubits) 


oJVlX / /U — 0.40 X 1U 


Laser ports 




Measurement devices 


lOM 


Purification/entanglement pulse rate 


1U IjrlZ 


Switch type 


add-drop filter 


Required physical yield 


y p = 40% 


Effective yield for lattice qubits 


y e — y P {i — (1 — y p ) ) — 25.b/o 


Functional column height 


/If — JX t/e — iyu 


Required local optical loss 


u.uz, /c 


T?pmiirpH nrlmctpH untp prrrvr rntp 

±\.CLI LI 1 1 L LI AU 1 L 1 r* IL LI LillL L 1 1 \Jl 1 dLC 


Perr si Pthresh/ ^ '° 


Required memory coherence time 


t™«™ > 1000/; + = 49 msec 


Communication Costs 




\A/ Pti," rfirniprtinii 


1HR nw — 1 1 1 nuke's 

1 LLJ_> , ± _i_ ± I'U1>L> 


\fi n Pit cfinn fnpi ohhrvrm cr pr\lnmn^ 


4HR n v — IDfiXnnkes 


L^Ll t t ICC f^stJCf lllHstlA 




Snh-1 nttipp fnptnr c 

OLIL71a.LLlL^C luL LU1 J 




Logical lattice 


oivi a iyu 


Pulses per lattice cycle (avg.) 


P/at ^ ^WPlV + n X2PX — 4.9 X 10 


T irti r*p p^/tm p ti m p 

.LjtlLLlV^C L V L,1C II111L. 


Wat — Plat l pulse — ± J /-isec 


Logical Qubit Operations 




Hole separation constant 


d = 14 


Lattice area per qubit (at rest, loosely packed) 


14d x 9d = 196 x 126 = 24696 


Lattice area per qubit (at rest, tightly packed) 


10d x 5d = 140 x 70 = 9800 


Hole movement time 


= 5~idii~+ = 3 41 msec 


Hole braiding time 


/, i = Fid.ti~+ = 3 41 msec 


Toftoli gate construction 


Mif»lcf»n f^hiian(rLyiLl n 1 

iMcisen 06 L^nuang 1 ', p. loz 


Finished |5") states per Toffoli gate (avg.) 


11.5 


Total braidings of S) states per Toffoli 


1795 


Toffoli gate time t to f 


~ ^tbraid = 48 msec 


Application Operations 




Maximum capacity, in logical qubits 


119836 


Number of application logical qubits 


6n = 12288 


\S) factory space 


77589 


"wiring" space 


25% = 29959 


Shor 




Length of number to be factored 


n = 2048 


Adder 


Carry-lookahead 


Adder time 


t a dd = 41og 2 nt tof = 2.1 seconds 


Modulo & indirect arithmetic 


w = 2,p = 11, ~ 5X faster than basic VBEl 102 l 103 l 


Number of adder calls 


n ad d = 4n 2 = 1.68 x 10 7 


Number of adders executed in parallel 


1 


Number of Toffoli gates 


n tof = 40n 3 = 3.2 X 10 11 


Time to execute algorithm only 


3.5 X 10 7 seconds (409 days) 


Time to create singular states 


2.7 X 10 7 seconds (314 days) 


Final execution time 


409 days 



built in parallel. For our system and this size of problem, rough balance is achieved with 
about 65% of the logical qubits dedicated to the IS") factory. 

The multicomputer organization is wide and shallow, to minimize refresh cycle time. 
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Fig. 7. Factoring time for 2,048-bit number using Shor's factoring algorithm, a) Our baseline proposal, with 
40% yield, O.ldB W connections and 0.4dB X connections, can be improved by increasing the size and 
application-level parallelism of the system. Improving yield above 40% reduces necessary resources only mod- 
erately, but raising the fidelity of the base-level entangled pairs has a major impact on both system size and 
performance, b) Achieving low-loss connections is critical to performance. 



Once we have decided to limit V to 1, the detailed chip layout simplifies, allowing the 
serpentine waveguide shown in Fig. [5] In this architecture, W connections are high fidelity, 
there are no V neighbors (Xx,i connections), and connections to neighboring columns 
need not leave the chip except at chip boundaries. The nx2 from Table Q]is still VRf /s, 
but physical connections are X connections with a loss of only about 0.4dB. The vertical 
height of a single chip will only accommodate enough cavities for a direct-mapped lattice, 
s = 1. 

Figure^ shows the execution time for our proposed system. A 2048-bit number should 
be factorable in just over 400 days, if the technological characteristics in Table [2] can be 
met. The system is large, requiring more than six billion lattice qubits and several times that 
total number when ancillae and transceivers are included. At the application level, much 
more parallelism is available if a larger system is built. A system one hundred times larger 
would factor the number in about five days. 

Figure [7J3 shows execution time as a function of the loss in our two key connection 
types, the intra-column W connections and the inter-column X connections. Minimizing 
the additional loss incurred in inter-column travel helps hold execution time within reason- 
able bounds. 

Reaching toward the desirable lower left corner of Fig. [7^ requires improving the base- 
level entanglement fidelity or reducing the number of pulses used to purify Bell pairs. 
Our system is fairly robust to yield. Below 40% it is difficult to build a system capable of 
running tQEC, but above that level, increasing yield has only minor effects on temporal and 
spatial resources. This gives a clear message: pursue fidelity and quality of components at 
the expense of yield. 



I 



I 
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6. Discussion 

Our design focuses on the communications within a quantum computer, building on a natu- 
ral hierarchy of connectivity ranging from direct coupling of neighbors on one physical axis 
of our chip through medium-fidelity, waveguide-based purification coupling on the other 
axis, to distant, switched connections requiring substantial purification. Thus, while we re- 
fer to our design as a quantum multicomputer with each node consisting of a single chip, 
it is more accurate to regard the connections between qubits as occurring on a set of levels 
rather than a simple internal/external distinction. Founded on quantum dots connected via 
cavity QED and nanophotonic waveguides and using topological error correction, this pro- 
posal represents progress toward a practical quantum computer architecture. The physical 
technologies are maturing rapidly, and tQEC offers both operational flexibility and a high 
threshold on realistic architectures such as ours. 

While the overall architecture (multicomputer) and the system building blocks (tQEC, 
purification circuits, etc.) have been established, much work remains to be done. The most 
important pending decision is the actual choice of semiconductor and quantum dot type. 
The cavity Q and memory lifetime, which dramatically affect our ability to build and main- 
tain the lattice cluster state, will be critical factors in this decision. The yield of functional 
qubits will ultimately drive the types of experiments that are feasible. 

With the decision of semiconductor and the key technical parameters in hand, it will 
become possible to more quantitatively analyze the mid-level design choices of node size, 
layout tradeoffs, and the numbers of required lasers and photodiodes. The control system 
for managing the qubits and cavity coupling will be a large engineering effort involving 
optics, electronic circuits, and possibly micromechanical elements. Finally, application al- 
gorithms need to be implemented and optimized and run-time systems deployed, which 
will require the creation of large software tool suites. 

One of our goals in this work is to establish target values for experimental parameters 
that must be achieved for such a large system to work. For the chip design and system 
configuration we present here, we estimate that the yield of functional quantum dots must 
be at least 40%, the local optical loss must be better than 0.02%, the adjusted gate error 
rate better than 0.2%, and the memory coherence time about 50 milliseconds or more. The 
exact values of these goals depend on the architecture, system scale, and application; the 
entire system is summarized in Table 12 

As a final comment, the physical resources demanded by this architecture are daunting. 
Other architectures for quantum computers are comparably daunting. The current work is 
intended in large part to reveal the scope of the problem. With realistic resources such as 
lossy waveguides, finite-yield qubits, and finite chip-sizes, the added overhead for error 
correction makes quantum computers very expensive by current standards. We must rely 
on engineering advancements to improve nanophotonic and quantum dot devices as well as 
VLSI-like manufacturing capabilities to realize a quantum computer with a realistic cost. 
Indeed, our current understanding of how to make very large quantum computers is of- 
ten likened to classical computers before VLSI techniques were developed. The successful 
technologies enabling practical approaches to building large computers are likely yet to be 
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discovered, but architectures such as the one we have presented and the defect-tolerant, 
communication-oriented design principles we have used are expected to provide the guid- 
ing context for these new technologies. 
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